18 research outputs found

    A Spanish dataset for reproducible benchmarked offline handwriting recognition

    Full text link
    [EN] In this paper, a public dataset for Offline Handwriting Recognition, along with an appropriate evaluation method to provide benchmark indicators at sentence level, is presented. This dataset, called SPA-Sentences, consists of offline handwritten Spanish sentences extracted from 1617 forms produced by the same number of writers. A total of 13,691 sentences comprising around 100,000 word instances out of a vocabulary of 3288 words occur in the collection. Careful attention has been paid to make the baseline experiments both reproducible and competitive. To this end, experiments are based on state-of-the-art recognition techniques combining convolutional blocks with one-dimensional Bidirectional Long Short Term Memory (LSTM) networks using Connectionist Temporal Classification (CTC) decoding. The scripts with the entire experimental setting have been made available. The SPA-Sentences dataset and its baseline evaluation are freely available for research purposes via the institutional University repository. We expect the research community to include this corpus, as is usually done with English IAM and French RIMES datasets, in their battery of experiments when reporting novel handwriting recognition techniques.España Boquera, S.; Castro-Bleda, MJ. (2022). A Spanish dataset for reproducible benchmarked offline handwriting recognition. Language Resources and Evaluation. 56(3):1009-1022. https://doi.org/10.1007/s10579-022-09587-31009102256

    Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System

    Full text link
    [EN] Neural Network Language Models (NNLMs) are a successful approach to Natural Language Processing tasks, such as Machine Translation. We introduce in this work a Statistical Machine Translation (SMT) system which fully integrates NNLMs in the decoding stage, breaking the traditional approach based on n-best list rescoring. The neural net models (both language models (LMs) and translation models) are fully coupled in the decoding stage, allowing to more strongly influence the translation quality. Computational issues were solved by using a novel idea based on memorization and smoothing of the softmax constants to avoid their computation, which introduces a trade-off between LM quality and computational cost. These ideas were studied in a machine translation task with different combinations of neural networks used both as translation models and as target LMs, comparing phrase-based and N-gram-based systems, showing that the integrated approach seems more promising for N-gram-based systems, even with nonfull-quality NNLMs.This work was partially supported by the Spanish MINECO and FEDER found under project TIN2017-85854-C4-2-R.Zamora Martínez, FJ.; Castro-Bleda, MJ. (2018). Efficient Embedded Decoding of Neural Network Language Models in a Machine Translation System. International Journal of Neural Systems. 28(9). https://doi.org/10.1142/S0129065718500077S28

    Towards a Universal Semantic Dictionary

    Full text link
    [EN] A novel method for finding linear mappings among word embeddings for several languages, taking as pivot a shared, multilingual embedding space, is proposed in this paper. Previous approaches learned translation matrices between two specific languages, while this method learns translation matrices between a given language and a shared, multilingual space. The system was first trained on bilingual, and later on multilingual corpora as well. In the first case, two different training data were applied: Dinu¿s English¿Italian benchmark data, and English¿Italian translation pairs extracted from the PanLex database. In the second case, only the PanLex database was used. The system performs on English¿Italian languages with the best setting significantly better than the baseline system given by Mikolov, and it provides a comparable performance with more sophisticated systems. Exploiting the richness of the PanLex database, the proposed method makes it possible to learn linear mappings among an arbitrary number of languages.This research was funded by Spanish MINECO and FEDER grant number TIN2017-85854-C4-2-R.Castro-Bleda, MJ.; Iklódi, E.; Recski, G.; Borbély, G. (2019). Towards a Universal Semantic Dictionary. Applied Sciences. 9(19):1-14. https://doi.org/10.3390/app9194060S114919Youn, H., Sutton, L., Smith, E., Moore, C., Wilkins, J. F., Maddieson, I., … Bhattacharya, T. (2016). On the universal structure of human lexical semantics. Proceedings of the National Academy of Sciences, 113(7), 1766-1771. doi:10.1073/pnas.1520752113Ruder, S., Vulić, I., & Søgaard, A. (2019). A Survey of Cross-lingual Word Embedding Models. Journal of Artificial Intelligence Research, 65, 569-631. doi:10.1613/jair.1.11640Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135-146. doi:10.1162/tacl_a_0005

    The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing

    Full text link
    [EN] This paper presents the `NoisyOffice¿ database. It consists of images of printed text documents with noise mainly caused by uncleanliness from a generic office, such as coffee stains and footprints on documents or folded and wrinkled sheets with degraded printed text. This corpus is intended to train and evaluate supervised learning methods for cleaning, binarization and enhancement of noisy images of grayscale text documents. As an example, several experiments of image enhancement and binarization are presented by using deep learning techniques. Also, double-resolution images are also provided for testing super-resolution methods. The corpus is freely available at UCI Machine Learning Repository. Finally, a challenge organized by Kaggle Inc. to denoise images, using the database, is described in order to show its suitability for benchmarking of image processing systems.This research was undertaken as part of the project TIN2017-85854-C4-2-R, jointly funded by the Spanish MINECO and FEDER founds.Castro-Bleda, MJ.; España Boquera, S.; Pastor Pellicer, J.; Zamora Martínez, FJ. (2020). The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing. The Computer Journal. 63(11):1658-1667. https://doi.org/10.1093/comjnl/bxz098S165816676311Bozinovic, R. M., & Srihari, S. N. (1989). Off-line cursive script word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(1), 68-83. doi:10.1109/34.23114Plamondon, R., & Srihari, S. N. (2000). Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 63-84. doi:10.1109/34.824821Vinciarelli, A. (2002). A survey on off-line Cursive Word Recognition. Pattern Recognition, 35(7), 1433-1446. doi:10.1016/s0031-3203(01)00129-7Impedovo, S. (2014). More than twenty years of advancements on Frontiers in handwriting recognition. Pattern Recognition, 47(3), 916-928. doi:10.1016/j.patcog.2013.05.027Baird, H. S. (2007). The State of the Art of Document Image Degradation Modelling. Advances in Pattern Recognition, 261-279. doi:10.1007/978-1-84628-726-8_12Egmont-Petersen, M., de Ridder, D., & Handels, H. (2002). Image processing with neural networks—a review. Pattern Recognition, 35(10), 2279-2301. doi:10.1016/s0031-3203(01)00178-9Marinai, S., Gori, M., & Soda, G. (2005). Artificial neural networks for document analysis and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(1), 23-35. doi:10.1109/tpami.2005.4Rehman, A., & Saba, T. (2012). Neural networks for document image preprocessing: state of the art. Artificial Intelligence Review, 42(2), 253-273. doi:10.1007/s10462-012-9337-zLazzara, G., & Géraud, T. (2013). Efficient multiscale Sauvola’s binarization. International Journal on Document Analysis and Recognition (IJDAR), 17(2), 105-123. doi:10.1007/s10032-013-0209-0Fischer, A., Indermühle, E., Bunke, H., Viehhauser, G., & Stolz, M. (2010). Ground truth creation for handwriting recognition in historical documents. Proceedings of the 8th IAPR International Workshop on Document Analysis Systems - DAS ’10. doi:10.1145/1815330.1815331Belhedi, A., & Marcotegui, B. (2016). Adaptive scene‐text binarisation on images captured by smartphones. IET Image Processing, 10(7), 515-523. doi:10.1049/iet-ipr.2015.0695Kieu, V. C., Visani, M., Journet, N., Mullot, R., & Domenger, J. P. (2013). An efficient parametrization of character degradation model for semi-synthetic image generation. Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing - HIP ’13. doi:10.1145/2501115.2501127Fischer, A., Visani, M., Kieu, V. C., & Suen, C. Y. (2013). Generation of learning samples for historical handwriting recognition using image degradation. Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing - HIP ’13. doi:10.1145/2501115.2501123Journet, N., Visani, M., Mansencal, B., Van-Cuong, K., & Billy, A. (2017). DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images. Journal of Imaging, 3(4), 62. doi:10.3390/jimaging3040062Walker, D., Lund, W., & Ringger, E. (2012). A synthetic document image dataset for developing and evaluating historical document processing methods. Document Recognition and Retrieval XIX. doi:10.1117/12.912203Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image Super-Resolution Using Deep Convolutional Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2), 295-307. doi:10.1109/tpami.2015.2439281Suzuki, K., Horiba, I., & Sugie, N. (2003). Neural edge enhancer for supervised edge enhancement from noisy images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12), 1582-1596. doi:10.1109/tpami.2003.1251151Hidalgo, J. L., España, S., Castro, M. J., & Pérez, J. A. (2005). Enhancement and Cleaning of Handwritten Data by Using Neural Networks. Lecture Notes in Computer Science, 376-383. doi:10.1007/11492429_46Pastor-Pellicer, J., España-Boquera, S., Zamora-Martínez, F., Afzal, M. Z., & Castro-Bleda, M. J. (2015). Insights on the Use of Convolutional Neural Networks for Document Image Binarization. Lecture Notes in Computer Science, 115-126. doi:10.1007/978-3-319-19222-2_10España-Boquera, S., Zamora-Martínez, F., Castro-Bleda, M. J., & Gorbe-Moya, J. (s. f.). Efficient BP Algorithms for General Feedforward Neural Networks. Lecture Notes in Computer Science, 327-336. doi:10.1007/978-3-540-73053-8_33Zamora-Martínez, F., España-Boquera, S., & Castro-Bleda, M. J. (s. f.). Behaviour-Based Clustering of Neural Networks Applied to Document Enhancement. Lecture Notes in Computer Science, 144-151. doi:10.1007/978-3-540-73007-1_18Graves, A., Fernández, S., & Schmidhuber, J. (2007). Multi-dimensional Recurrent Neural Networks. Artificial Neural Networks – ICANN 2007, 549-558. doi:10.1007/978-3-540-74690-4_56Sauvola, J., & Pietikäinen, M. (2000). Adaptive document image binarization. Pattern Recognition, 33(2), 225-236. doi:10.1016/s0031-3203(99)00055-2Pastor-Pellicer, J., Castro-Bleda, M. J., & Adelantado-Torres, J. L. (2015). esCam: A Mobile Application to Capture and Enhance Text Images. Lecture Notes in Computer Science, 601-604. doi:10.1007/978-3-319-19222-2_5

    Fallback Variable History NNLMs: Efficient NNLMs by precomputation and stochastic training

    Full text link
    [EN] This paper presents a new method to reduce the computational cost when using Neural Networks as Language Models, during recognition, in some particular scenarios. It is based on a Neural Network that considers input contexts of different length in order to ease the use of a fallback mechanism together with the precomputation of softmax normalization constants for these inputs. The proposed approach is empirically validated, showing their capability to emulate lower order N-grams with a single Neural Network. A machine translation task shows that the proposed model constitutes a good solution to the normalization cost of the output softmax layer of Neural Networks, for some practical cases, without a significant impact in performance while improving the system speed.This work was partially supported by the Spanish MINECO and FEDER founds under project TIN2017-85854-C4-2-R (to MJCB). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Zamora Martínez, FJ.; España Boquera, S.; Castro-Bleda, MJ.; Palacios Corella (2018). Fallback Variable History NNLMs: Efficient NNLMs by precomputation and stochastic training. PLoS ONE. 13(7). https://doi.org/10.1371/journal.pone.0200884S13

    Transcripción humana o asistencia a la transcripción automática interactiva: reconocimiento automático del texto, anotación y edición erudita en el siglo XXI

    Full text link
    [EN] Computer assisted transcription tools can speed up the initial process of reading and transcribing texts. At the same time, new annotation tools open new ways of accessing the text in its graphical form. The balance and value of each method still needs to be explored. STATE, a complete assisted transcription system for ancient documents, was presented to the audience of the 2013 International Medieval Congress at Leeds. The system offers a multimodal interaction environment to assist humans in transcribing ancient documents: the user can type, write on the screen with a stylus, or utter a word. When one of these actions is used to correct an erroneous word, the system uses this new information to look for other mistakes in the rest of the line. The system is modular, composed of different parts: one part creates projects from a set of images of documents, another part controls an automatic transcription system, and the third part allows the user to interact with the transcriptions and easily correct them as needed. This division of labour allows great flexibility for organising the work in a team of transcribers.[ES] Las herramientas de ayuda a la transcripción automática pueden acelerar el proceso inicial de la lectura y transcripción de textos. Al mismo tiempo, las nuevas herramientas de anotación aportan nuevas formas de acceder al texto en su forma original gráfica. Sin embargo, todavía es necesario evaluar las bondades y capacidades de los distintos métodos. STATE, un completo sistema de asistencia a la transcripción de documentos antiguos, se presentó a la audiencia del International Medieval Congress de 2013 celebrado en Leeds. El sistema ofrece un entorno de interacción multimodal para ayudar a las personas en la transcripción de documentos antiguos: el usuario puede teclear, escribir en la pantalla con un lápiz óptico o corregir usando la voz. Cada vez que el usuario cambia de esta forma una palabra, el sistema utiliza la corrección para buscar errores en el resto de la línea. El sistema está dividido en diferentes módulos: uno crea proyectos a partir de un conjunto de imágenes de documentos, otro módulo controla el sistema de transcripción automática, y un tercer módulo permite al usuario interactuar con las transcripciones y corregirlas fácilmente cuando sea necesario. Esta división de las tareas permite una gran flexibilidad para organizar el trabajo de los transcriptores en equipo.Work supported by the Spanish Government (TIN2010-18958) and the Generalitat Valenciana (Prometeo/2010/028)Castro-Bleda, MJ.; Vilar Torres, JM.; España Boquera, S.; Llorens, D.; Marzal Varó, A.; Prat, F.; Zamora Martínez, FJ. (2014). Human or computer assisted interactive transcription: automated text recognition, text annotation, and scholarly edition in the twenty-first century. Mirabilia Journal. 18(1):247-253. http://hdl.handle.net/10251/61398S24725318

    F-Measure as the error function to train neural networks

    Full text link
    Imbalance datasets impose serious problems in machine learning. For many tasks characterized by imbalanced data, the F-Measure seems more appropiate than the Mean Square Error or other error measures. This paper studies the use of F-Measure as the training criterion for Neural Networks by integrating it in the Error-Backpropagation algorithm. This novel training criterion has been validated empirically on a real task for which F-Measure is typically applied to evaluate the quality. The task consists in cleaning and enhancing ancient document images which is performed, in this work, by means of neural filters.This work has been partially supported by MICINN project HITITA (TIN2010-18958) and by the FPI-MICINN (BES-2011-046167) scholarship from Ministerio de Ciencia e Innovación, Gobierno de España.Pastor Pellicer, J.; Zamora Martínez, FJ.; España Boquera, S.; Castro-Bleda, MJ. (2013). F-Measure as the error function to train neural networks. En Advances in Computational Intelligence. Springer Verlag (Germany). 376-384. https://doi.org/10.1007/978-3-642-38679-4_37S376384Dembczyński, K., Waegeman, W., Cheng, W., Hüllermeier, E.: An exact algorithm for f-measure maximization. Advances in Neural Information Processing Systems 24, 223–230 (2011)Al-Haddad, L., Morris, C.W., Boddy, L.: Training radial basis function neural networks: effects of training set size and imbalanced training sets. J. of Microbiological Methods 43(1), 33–44 (2000)Bilmes, J., Asanovic, K., Chin, C.W., Demmel, J.: Using PHiPAC to speed error back-propagation learning. In: Proc. of ICASSP, vol. 5, pp. 4153–4156 (1997)Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley (2001)Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: Proc. of ICDAR, pp. 1375–1382 (2009)Gatos, B., Ntirogiannis, K., Pratikakis, I.: DIBCO 2009: document image binarization contest. Int. J. on Document Analysis and Recognition 14(1), 35–44 (2011)Hidalgo, J.L., España, S., Castro, M.J., Pérez, J.A.: Enhancement and cleaning of handwritten data by using neural networks. In: Marques, J.S., Pérez de la Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3522, pp. 376–383. Springer, Heidelberg (2005)Jansche, M.: Maximum expected f-measure training of logistic regression models. In: Proc. of HLT & EMNLP, pp. 692–699 (2005)Musicant, D.R., Kumar, V., Ozgur, A.: Optimizing f-measure with support vector machines. In: Proc. of Int. Florida AI Research Society Conference, pp. 356–360 (2003)Ntirogiannis, K., Gatos, B., Pratikakis, I.: A Performance Evaluation Methodology for Historical Document Image Binarization (2012)Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 Competition on Handwritten Document Image Binarization (H-DIBCO 2012) (2012)Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-DIBCO 2010-handwritten document image binarization competition. In: Proc. of ICFHR, pp. 727–732 (2010)van Rijsbergen, C.J.: A theoretical basis for the use of co-occurrence data in information retrieval. J. of Documentation 33(2), 106–119 (1977)Wolf, C.: Document Ink Bleed-Through Removal with Two Hidden Markov Random Fields and a Single Observation Field. IEEE PAMI 32(3), 431–447 (2010)Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. on Knowledge and Data Engineering 18(1), 63–77 (2006

    Improving offline handwritten text recognition with hybrid HMM/ANN models

    Full text link
    This paper proposes the use of hybrid Hidden Markov Model (HMM)/Artificial Neural Network (ANN) models for recognizing unconstrained offline handwritten texts. The structural part of the optical models has been modeled with Markov chains, and a Multilayer Perceptron is used to estimate the emission probabilities. This paper also presents new techniques to remove slope and slant from handwritten text and to normalize the size of text images with supervised learning methods. Slope correction and size normalization are achieved by classifying local extrema of text contours with Multilayer Perceptrons. Slant is also removed in a nonuniform way by using Artificial Neural Networks. Experiments have been conducted on offline handwritten text lines from the IAM database, and the recognition rates achieved, in comparison to the ones reported in the literature, are among the best for the same task. © 2006 IEEE.The authors acknowledge the valuable help provided by Moises Pastor, Juan Miguel Vilar, Alex Graves, and Marcus Liwicki. Thanks are also due to the reviewers and the Editor-in-Chief for their many valuable comments and suggestions. This work has been partially supported by the Spanish Ministerio de Educacion y Ciencia (TIN2006-12767) and by the BPFI 06/250 Scholarship from the Conselleria d'Empresa, Universitat i Ciencia, Generalitat Valenciana.España Boquera, S.; Castro-Bleda, MJ.; Gorbe Moya, J.; Zamora Martínez, FJ. (2011). Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Transactions on Pattern Analysis and Machine Intelligence. 33(4):767-779. https://doi.org/10.1109/TPAMI.2010.141S76777933

    Fallback Variable History NNLMs: Efficient NNLMs by precomputation and stochastic training.

    No full text
    This paper presents a new method to reduce the computational cost when using Neural Networks as Language Models, during recognition, in some particular scenarios. It is based on a Neural Network that considers input contexts of different length in order to ease the use of a fallback mechanism together with the precomputation of softmax normalization constants for these inputs. The proposed approach is empirically validated, showing their capability to emulate lower order N-grams with a single Neural Network. A machine translation task shows that the proposed model constitutes a good solution to the normalization cost of the output softmax layer of Neural Networks, for some practical cases, without a significant impact in performance while improving the system speed

    Neural network language models for off-line handwriting recognition

    Full text link
    [EN] Unconstrained off-line continuous handwritten text recognition is a very challenging task which has been recently addressed by different promising techniques. This work presents our latest contribution to this task, integrating neural network language models in the decoding process of three state-of-the-art systems: one based on bidirectional recurrent neural networks, another based on hybrid hidden Markov models and, finally, a combination of both. Experimental results obtained on the IAM off-line database demonstrate that consistent word error rate reductions can be achieved with neural network language models when compared with statistical N-gram language models on the three tested systems. The best word error rate, 16.1%, reported with ROVER combination of systems using neural network language models significantly outperforms current benchmark results for the IAM database.The authors wish to acknowledge the anonymous reviewers for their detailed and helpful comments to the paper. We also thank Alex Graves for kindly providing us with the BLSTM Neural Network source code. This work has been supported by the European project FP7-PEOPLE-2008-IAPP: 230653, the Spanish Government under project TIN2010-18958, as well as by the Swiss National Science Foundation (Project CRSI22_125220).Zamora Martínez, FJ.; Frinken, V.; España Boquera, S.; Castro-Bleda, MJ.; Fischer, A.; Bunke, H. (2014). Neural network language models for off-line handwriting recognition. Pattern Recognition. 47(4):1642-1652. https://doi.org/10.1016/j.patcog.2013.10.020S1642165247
    corecore